Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate 0-layer thermodynamics in the CICE driver #732

Merged
merged 5 commits into from
Jul 31, 2022

Conversation

eclare108213
Copy link
Contributor

@eclare108213 eclare108213 commented Jul 22, 2022

  • Short (1 sentence) summary of your PR:
    In this initial deprecation step, the zero-layer (ktherm=0) thermodynamics option is removed via cpp flags so that it can easily be reinstated if necessary. Documentation is also updated.

The intent is for this cpp'd code to be released, and then for these changes to be made permanent in a followup release, removing all of the cpp'd code and the cpp flags in both CICE and Icepack. The initial code release allows for feedback from the user community earlier during the process, before the proposed changes become more difficult to reinstate. (So speak up!)

  • Developer(s):
    @eclare108213
  • Suggest PR reviewers from list in the column to the right.
  • Please copy the PR test results link or provide a summary of testing completed below.

See https://github.com/CICE-Consortium/Test-Results/wiki/7b5538dd9a.badger.intel.22-07-28.174752.0.

373 measured results of 373 total results
293 of 373 tests PASSED
0 of 373 tests PENDING
54 of 373 tests MISSING data
26 of 373 tests FAILED

See comments below for explanation.

  • How much do the PR code changes differ from the unmodified code?
    • bit for bit except for testing options in which ktherm=0 changed to ktherm=1 and krdg_partic and krdg_redist were removed
    • different at roundoff level
    • more substantial
  • Does this PR create or have dependencies on Icepack or any other models?
    • Yes
    • No
  • Does this PR add any new test cases?
    • Yes
    • No but it changes several test cases that invoked ktherm=0, krdg_partic=0, krdg_redist=0.
  • Is the documentation being updated? ("Documentation" includes information on the wiki or in the .rst files from doc/source/, which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/. A test build of the technical docs will be performed as part of the PR testing.)
    • Yes
    • No, does the documentation need to be updated at a later time?
      • Yes
      • No
  • Please provide any additional information or relevant details below:
    This PR uses the UNDEPRECATE_0LAYER cpp flag to remove the 0-layer thermodynamics (ktherm=0) option, as in the associated Icepack PR #394. It also removes the internally defined heat_capacity option, which is always false when ktherm=0 and true for the other thermodynamic options. The only namelist change for the 0-layer change is to ktherm. Various set_nml option files that had ktherm=0 were changed to ktherm=1 (BL99). The cice4 restart conversion fortran code is also modified slightly.

To undeprecate the 0-layer option, set -DUNDEPRECATE_0LAYER in cice.settings' CPPDEFS and adjust namelist values as needed.

This PR also updates a machine file for badger.

Code and documentation related to the old ridging participation and redistribution functions was also deprecated, and then reverted (not deprecated). See comments in Icepack#394.

Copy link
Contributor

@dabail10 dabail10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@eclare108213
Copy link
Contributor Author

eclare108213 commented Jul 22, 2022

Failed tests on badger

These always fail because 40 pes are not available:
FAIL badger_intel_restart_gx1_40x4_droundrobin_medium run
FAIL badger_intel_restart_gx1_40x4_droundrobin_medium test
FAIL badger_intel_restart_tx1_40x4_dsectrobin_medium run
FAIL badger_intel_restart_tx1_40x4_dsectrobin_medium test

These are expected to fail with ktherm=0 in the baseline, 1 in the new test:
FAIL badger_intel_restart_gx3_6x2_alt01 complog depr_baseline different-data
FAIL badger_intel_restart_gx3_6x2_alt01 compare depr_baseline 11.58 6.94 2.95 different-data
FAIL badger_intel_restart_gbox128_4x2_boxnodyn_short complog depr_baseline different-data
FAIL badger_intel_restart_gbox128_4x2_boxnodyn_short compare depr_baseline 13.47 9.23 3.14 different-data
FAIL badger_intel_restart_gbox128_4x2_boxnodyn_debug_short complog depr_baseline different-data
FAIL badger_intel_restart_gbox128_4x2_boxnodyn_debug_short compare depr_baseline 216.11 194.49 13.21 different-data
FAIL badger_intel_restart_gbox128_2x2_boxadv_short complog depr_baseline different-data
FAIL badger_intel_restart_gbox128_2x2_boxadv_short compare depr_baseline 159.60 130.20 20.78 different-data
FAIL badger_intel_restart_gbox128_4x4_boxrestore_short complog depr_baseline different-data
FAIL badger_intel_restart_gbox128_4x4_boxrestore_short compare depr_baseline 20.88 13.38 4.00 different-data
FAIL badger_intel_smoke_gbox128_4x4_boxrestore_debug_short complog depr_baseline different-data
FAIL badger_intel_smoke_gbox128_4x4_boxrestore_debug_short compare depr_baseline 63.51 51.82 3.25 different-data

This one is failing in EAP:
FAIL badger_intel_smoke_gbox128_2x2_boxadv_debug_short run -1 -1 -1
FAIL badger_intel_smoke_gbox128_2x2_boxadv_debug_short test

forrtl: severe (408): fort: (3): Subscript #1 of the array S11R has value -29 which is less than the lower bound of 1
Image PC Routine Line Source
cice 0000000001E16FDF Unknown Unknown Unknown
cice 0000000000803054 ice_dyn_eap_mp_up 1827 ice_dyn_eap.F90

These tests also failed in the baseline case (segfault) as well as this test suite, and so need to be debugged but are probably independent of the changes in this PR:
FAIL badger_intel_smoke_gx3_8x2_bgcz run -1 -1 -1
FAIL badger_intel_smoke_gx3_8x2_bgcz test
FAIL badger_intel_smoke_gx3_8x2_bgcz_debug run -1 -1 -1
FAIL badger_intel_smoke_gx3_8x2_bgcz_debug test
FAIL badger_intel_restart_gx1_4x2_bgcsklclim_medium run
FAIL badger_intel_restart_gx1_4x2_bgcsklclim_medium test

54 tests are showing missing data. The baseline suite shows the runs as still pending even though they are not in the queue. Badger seems very flaky. @dabail10 @apcraig could one of you launch a CICE test from this branch, please?

Edit for clarification: please run a base_suite with regression

I'll look into the EAP out-of-bounds error and the bgc segfaults.

@dabail10
Copy link
Contributor

I am running the full test suite on the ponds on cheyenne. I will try to get to this one on Monday. Did you also try running with setting -DUNDEPRECATE_0layer?

@eclare108213
Copy link
Contributor Author

I have not yet tried -DUNDEPRECATE_0layer - will try that on a single run now.

@eclare108213
Copy link
Contributor Author

eclare108213 commented Jul 25, 2022 via email

@dabail10
Copy link
Contributor

I will test it out now. I ran the cesmponds over the weekend.

@dabail10
Copy link
Contributor

A lot of FAILS here, but most are non-bfb and are consistent with the change from ktherm=0 to 1 in alt01, boxadv, boxnodyn, and boxrestore. Also, the dynpicard have different answers. The only ones that are not different data is:

FAIL cheyenne_intel_smoke_gbox128_2x2_boxadv_debug_short run -1 -1 -1
FAIL cheyenne_intel_smoke_gbox128_2x2_boxadv_debug_short test
FAIL cheyenne_gnu_smoke_gbox128_2x2_boxadv_debug_short run -1 -1 -1
FAIL cheyenne_gnu_smoke_gbox128_2x2_boxadv_debug_short test

Here is the error in one:

At line 1827 of file /glade/work/dbailey/CICE_elizabeth/cicecore/cicedynB/dynamics/ice_dyn_eap.F90
Fortran runtime error: Index '-29' of dimension 1 of array 's11r' below lower bound of 1

@eclare108213
Copy link
Contributor Author

Thank you. I will look into the EAP array error.

@eclare108213 eclare108213 changed the title Deprecate 0-layer thermodynamics in the CICE driver Deprecate 0-layer thermodynamics and old ridging functions in the CICE driver Jul 28, 2022
@eclare108213
Copy link
Contributor Author

eclare108213 commented Jul 28, 2022

FAILed test results with both sets of deprecations, on badger:

These always fail because 40 pes are not available on badger:
FAIL badger_intel_restart_gx1_40x4_droundrobin_medium run
FAIL badger_intel_restart_gx1_40x4_droundrobin_medium test
FAIL badger_intel_restart_tx1_40x4_dsectrobin_medium run
FAIL badger_intel_restart_tx1_40x4_dsectrobin_medium test

These are expected to fail with ktherm=0 in the baseline, 1 in the new test:
FAIL badger_intel_restart_gx3_6x2_alt01 complog depr_baseline different-data
FAIL badger_intel_restart_gx3_6x2_alt01 compare depr_baseline 11.58 6.94 2.95 different-data
FAIL badger_intel_restart_gbox128_4x2_boxnodyn_short complog depr_baseline different-data
FAIL badger_intel_restart_gbox128_4x2_boxnodyn_short compare depr_baseline 13.47 9.23 3.14 different-data
FAIL badger_intel_restart_gbox128_4x2_boxnodyn_debug_short complog depr_baseline different-data
FAIL badger_intel_restart_gbox128_4x2_boxnodyn_debug_short compare depr_baseline 216.11 194.49 13.21 different-data

These are expected to fail with krdg_partic=0 and krdg_redist=0 in the baseline, defaults to 1 in the new test:
FAIL badger_intel_restart_gx3_8x2_alt02 complog depr_baseline different-data
FAIL badger_intel_restart_gx3_8x2_alt02 compare depr_baseline 4.40 2.28 0.90 different-data

These are expected to fail due to the namelist changes for both sets of deprecations. The boxadv runs are now completing but with different data, as expected:
FAIL badger_intel_restart_gbox128_4x4_boxrestore_short complog depr_baseline different-data
FAIL badger_intel_restart_gbox128_4x4_boxrestore_short compare depr_baseline 20.88 13.38 4.00 different-data
FAIL badger_intel_smoke_gbox128_4x4_boxrestore_debug_short complog depr_baseline different-data
FAIL badger_intel_smoke_gbox128_4x4_boxrestore_debug_short compare depr_baseline 63.51 51.82 3.25 different-data
FAIL badger_intel_restart_gbox128_2x2_boxadv_short complog depr_baseline different-data
FAIL badger_intel_restart_gbox128_2x2_boxadv_short compare depr_baseline 159.60 130.20 20.78 different-data
FAIL badger_intel_smoke_gbox128_2x2_boxadv_debug_short complog depr_baseline different-data
FAIL badger_intel_smoke_gbox128_2x2_boxadv_debug_short compare depr_baseline 261.18 226.25 14.48 different-data

These tests also failed in the baseline case (segfault) as well as this test suite, and so are probably independent of the changes in this PR:
FAIL badger_intel_smoke_gx3_8x2_bgcz run -1 -1 -1
FAIL badger_intel_smoke_gx3_8x2_bgcz test
FAIL badger_intel_smoke_gx3_8x2_bgcz_debug run -1 -1 -1
FAIL badger_intel_smoke_gx3_8x2_bgcz_debug test
FAIL badger_intel_restart_gx1_4x2_bgcsklclim_medium run
FAIL badger_intel_restart_gx1_4x2_bgcsklclim_medium test

I think these results are acceptable, although there are a lot of missing regression tests so I'm not 100% confident that everything is BFB. I also tested that the code compiles with the UNDEPRECATE flags set. Results will be different due to the namelist changes (not tested).

@eclare108213 eclare108213 marked this pull request as ready for review July 28, 2022 21:51
@eclare108213 eclare108213 changed the title Deprecate 0-layer thermodynamics and old ridging functions in the CICE driver Deprecate 0-layer thermodynamics in the CICE driver Jul 29, 2022
@eclare108213
Copy link
Contributor Author

New test results from badger with just the 0-layer deprecation:

375 measured results of 375 total results
297 of 375 tests PASSED
2 of 375 tests PENDING
56 of 375 tests MISSING data
20 of 375 tests FAILED

Same failures as above, except now the alt02_debug_short test is missing and the other alt02 test (which failed above) now passes. Cheyenne tests are underway and more reliable.

@apcraig
Copy link
Contributor

apcraig commented Jul 29, 2022

Test results from cheyenne look fine, https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks#11136cb0cf7a875efd1c08019b014bbc2a96ec4d. Looks like alt01, boxnodyn, boxadv, and boxrestore changed answers.

@apcraig apcraig merged commit 3af3d1b into CICE-Consortium:main Jul 31, 2022
dabail10 pushed a commit to ESCOMP/CICE that referenced this pull request Oct 4, 2022
)

* initial 0-layer thermo deprecation

* capitalization matters for cpps

* set_nml.boxadv needs thermo turned on

* deprecate old ridging participation and redistribution functions

* Revert "deprecate old ridging participation and redistribution functions"

This reverts commit 95c289a.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants